Context-Aware Coordination of Cascaded Machine Translations

نویسندگان

  • Toru Ishida
  • Rie TANAKA
چکیده

Recently more and more people communicate through the Internet, and machine translators are used as communication tools between peoples who cannot speak their partners' mother language. Since English is a hub language of development of language resources like machine translators or bilingual dictionaries, combining multiple machine translators via English enables intercultural communication and collaboration between non-English languages. To combine multiple machine translators, problems interfering communication sometimes occur. Cascaded machine translators often yield mistranslations due to inconsistency of word selections, even if all translators are combined correctly and each translation result is correct. This is because each translator considers each input sentence only. This phenomenon is a big problem for both multi-hop translation cascading multiple translators and machine translator-mediated communication. Then, coordination of cascaded machine translators is needed. For resolution of such problems, this research addresses following issues. In order to examine whether the sense of translated sentence is different from the one of the source sentence, equivalent terms of all languages is required. Equivalent terms between two languages is developed as bilingual dictionaries between lots of languages, while that of more than three languages is developed manually among parts of languages. Therefore this research aims to generate multilingual equivalent terms automatically from existing language resources. Coordination of translators, that is, consistent word selections, can be realized by extracting context and propagating it to machine translators. Context is extracted from the source sentence or whole document including the source sentence. The sense of translated sentence is kept consistent by ii selecting translated words which suit propagated context. Methods of extracting context are proposed in previous researches. This research assumes that the context is already extracted, and focuses on the coordination by propagating extracted context. To solve these two issues, this research proposed following solutions. This research proposed the method to obtain multilingual equivalent terms by combining multiple bilingual dictionaries. Relations between words and translated words are represented as a graph, and equivalent terms are obtained by using the structure of it. If simply combining multiple dictionaries, there are some cases where multiple terms which do not share the same sense are also obtained. Such error can be prevented by considering the structure of the graph. This research proposed a method to coordinate cascaded machine translators so as to select translated words based on the context which was propagated using multilingual equivalent terms. Each context extracted from the sentence is represented in …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Coordination of Multiple Machine Translation Services

Machine translation services available on the Web are getting increasingly popular. Multiple translation services are often combined for intercultural communication. Furthermore, since English is most frequently used as a hub language, it is often necessary to cascade different translation services to realize translations between non-English languages. As a result, the word sense is often chang...

متن کامل

Virtual Babel: Towards Context-Aware Machine Translation in Virtual Worlds

In this paper, we describe our ongoing research project of Virtual Babel, a contextaware machine translation system for Second Life, one of the most popular virtual worlds. We augment the Second Life viewer to intercept the incoming/outgoing chat messages and reroute the message to a statistical machine translation server. The returned translations are appended to the original text message to h...

متن کامل

The UPC TweetMT participation: Translating Formal Tweets Using Context Information

In this paper, we describe the UPC systems that participated in the TweetMT shared task. We developed two main systems that were applied to the Spanish–Catalan language pair: a state-of-the-art phrase-based statistical machine translation system and a context-aware system. In the second approach, we define the “context” for a tweet as the tweets of a user produced in the same day, and also, we ...

متن کامل

Statistical machine translation with cascaded probabilistic transducers

Statistical machine translation is based on the idea to extract information from bilingual corpora, which can be used to generate new translations. The current work combines aspects from example-based machine translation and from grammar-based approaches, esp. bilingual regular grammars, to develop a statistical translation system based on cascaded transducers. These transducers can be construc...

متن کامل

Context-aware Discriminative Phrase Selection for Statistical Machine Translation

In this work we revise the application of discriminative learning to the problem of phrase selection in Statistical Machine Translation. Inspired by common techniques used in Word Sense Disambiguation, we train classifiers based on local context to predict possible phrase translations. Our work extends that of Vickrey et al. (2005) in two main aspects. First, we move from word translation to ph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008